39 research outputs found
Unfolding and Shrinking Neural Machine Translation Ensembles
Ensembling is a well-known technique in neural machine translation (NMT) to
improve system performance. Instead of a single neural net, multiple neural
nets with the same topology are trained separately, and the decoder generates
predictions by averaging over the individual models. Ensembling often improves
the quality of the generated translations drastically. However, it is not
suitable for production systems because it is cumbersome and slow. This work
aims to reduce the runtime to be on par with a single system without
compromising the translation quality. First, we show that the ensemble can be
unfolded into a single large neural network which imitates the output of the
ensemble system. We show that unfolding can already improve the runtime in
practice since more work can be done on the GPU. We proceed by describing a set
of techniques to shrink the unfolded network by reducing the dimensionality of
layers. On Japanese-English we report that the resulting network has the size
and decoding speed of a single NMT network but performs on the level of a
3-ensemble system.Comment: Accepted at EMNLP 201
Recommended from our members
The Roles of Language Models and Hierarchical Models in Neural Sequence-to-Sequence Prediction
With the advent of deep learning, research in many areas of machine learning is converging towards the same set of methods and models. For example, long short-term memory networks are not only popular for various tasks in natural language processing (NLP) such as speech recognition, machine translation, handwriting recognition, syntactic parsing, etc., but they are also applicable to seemingly unrelated fields such as robot control, time series prediction, and bioinformatics. Recent advances in contextual word embeddings like BERT boast with achieving state-of-the-art results on 11 NLP tasks with the same model. Before deep learning, a speech recognizer and a syntactic parser used to have little in common as systems were much more tailored towards the task at hand.
At the core of this development is the tendency to view each task as yet another data mapping problem, neglecting the particular characteristics and (soft) requirements tasks often have in practice. This often goes along with a sharp break of deep learning methods with previous research in the specific area. This work can be understood as an antithesis to this paradigm. We show how traditional symbolic statistical machine translation models can still improve neural machine translation (NMT) while reducing the risk for common pathologies of NMT such as hallucinations and neologisms. Other external symbolic models such as spell checkers and morphology databases help neural grammatical error correction. We also focus on language models that often do not play a role in vanilla end-to-end approaches and apply them in different ways to word reordering, grammatical error correction, low-resource NMT, and document-level NMT. Finally, we demonstrate the benefit of hierarchical models in sequence-to-sequence prediction. Hand-engineered covering grammars are effective in preventing catastrophic errors in neural text normalization systems. Our operation sequence model for interpretable NMT represents translation as a series of actions that modify the translation state, and can also be seen as derivation in a formal grammar.EPSRC grant EP/L027623/1
EPSRC Tier-2 capital grant EP/P020259/
The Edit Distance Transducer in Action: The University of Cambridge English-German System at WMT16
This paper presents the University of Cambridge submission to WMT16.
Motivated by the complementary nature of syntactical machine translation and
neural machine translation (NMT), we exploit the synergies of Hiero and NMT in
different combination schemes. Starting out with a simple neural lattice
rescoring approach, we show that the Hiero lattices are often too narrow for
NMT ensembles. Therefore, instead of a hard restriction of the NMT search space
to the lattice, we propose to loosely couple NMT and Hiero by composition with
a modified version of the edit distance transducer. The loose combination
outperforms lattice rescoring, especially when using multiple NMT systems in an
ensemble
UCAM Biomedical Translation at WMT19: Transfer Learning Multi-domain Ensembles
The 2019 WMT Biomedical translation task involved translating Medline
abstracts. We approached this using transfer learning to obtain a series of
strong neural models on distinct domains, and combining them into multi-domain
ensembles. We further experiment with an adaptive language-model ensemble
weighting scheme. Our submission achieved the best submitted results on both
directions of English-Spanish
An Operation Sequence Model for Explainable Neural Machine Translation
We propose to achieve explainable neural machine translation (NMT) by
changing the output representation to explain itself. We present a novel
approach to NMT which generates the target sentence by monotonically walking
through the source sentence. Word reordering is modeled by operations which
allow setting markers in the target sentence and move a target-side write head
between those markers. In contrast to many modern neural models, our system
emits explicit word alignment information which is often crucial to practical
machine translation as it improves explainability. Our technique can outperform
a plain text system in terms of BLEU score under the recent Transformer
architecture on Japanese-English and Portuguese-English, and is within 0.5 BLEU
difference on Spanish-English